Neural network system for gesture, wear, activity, or carry detection on a wearable or mobile device

ABSTRACT

A neural network system includes an eyewear device. The eyewear device has a movement tracker, such as an accelerometer, gyroscope, or an inertial measurement unit for measuring acceleration and rotation. The neural network system tracks, via the movement tracker, movement of the eyewear device from at least one finger contact inputted from a user on an input surface. The neural network system identifies a finger gesture by detecting at least one detected touch event based on variation of the tracked movement of the eyewear device over a time period. The neural network system adjusts the image presented on the image display of the eyewear device based on the identified finger gesture. The neural network system can also detect whether the user is wearing the eyewear device and identify an activity of the user wearing the eyewear device based on the variation of the tracked movement over the time period.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a Continuation of U.S. application Ser. No.17/245,997 filed on Apr. 30, 2021, which is a Divisional of U.S. patentapplication Ser. No. 16/569,884, filed on Sep. 13, 2019, now U.S. Pat.No. 10,996,846, and claims priority to U.S. Provisional Application Ser.No. 62/738,093 filed on Sep. 28, 2018, the contents of all of which areincorporated fully herein by reference.

TECHNICAL FIELD

The present subject matter relates to wearable devices, e.g., eyeweardevices, and mobile devices and techniques to detect user gestures andwear detection.

BACKGROUND

Computing devices, such as wearable devices, including portable eyeweardevices (e.g., smartglasses, headwear, and headgear), necklaces, andsmartwatches and mobile devices (e.g., tablets, smartphones, andlaptops) integrate image displays and cameras. Viewing and interactingwith the displayed content on the devices can be difficult due to thesmall image display area available on the wearable device and mobiledevice.

For example, size limitations and the form factor of a wearable device,such as the eyewear device, can make a user interface difficult toincorporate into the eyewear device. The available area for placement ofvarious control buttons on an eyewear device, e.g., to operate a camera,and graphical user interface elements on the image display of theeyewear device is limited. Due to the small form factor of the eyeweardevice, manipulation and interacting with, for example, displayedcontent on an image display is cumbersome.

Accordingly, a need exists to simply user interactions with wearabledevices, including eyewear devices, and mobile devices.

BRIEF DESCRIPTION OF THE DRAWINGS

The drawing figures depict one or more implementations, by way ofexample only, not by way of limitations. In the figures, like referencenumerals refer to the same or similar elements.

FIG. 1A is a right side view of an example hardware configuration of aneyewear device, which includes a movement tracker, utilized in a neuralnetwork system for gesture, wear, and activity detection.

FIGS. 1B and 1C are rear views of example hardware configurations of theeyewear device, including two different types of image displays.

FIG. 1D depicts a schematic view of operation of the movement tracker ofthe eyewear device connected to an input surface to track movement ofthe eyewear device from at least one finger contact inputted from auser.

FIG. 2 is a top cross-sectional view of a left chunk of the eyeweardevice of FIGS. 1B and 1C depicting the movement tracker and a circuitboard.

FIG. 3A is a high-level functional block diagram of an example neuralnetwork system including the eyewear device with the movement tracker toidentify finger gestures based on a neural network model, a mobiledevice, and a server system connected via various networks.

FIG. 3B shows an example of a hardware configuration for the serversystem of the neural network system of FIG. 3A to build a neural networkmodel for identifying finger gestures, in simplified block diagram form.

FIG. 3C is a high-level functional block diagram of an example neuralnetwork system including the eyewear device with the movement tracker todetect wear based on a neural network model, a mobile device, and aserver system connected via various networks.

FIG. 3D shows an example of a hardware configuration for the serversystem of the neural network system of FIG. 3B to build a neural networkmodel for wear detection, in simplified block diagram form.

FIG. 3E is a high-level functional block diagram of an example neuralnetwork system including the eyewear device with the movement tracker toidentify activities based on a neural network model during wearing ofthe eyewear device, a mobile device, and a server system connected viavarious networks.

FIG. 3F shows an example of a hardware configuration for the serversystem of the neural network system of FIG. 3E to build a neural networkmodel for identifying activities during wearing of the eyewear device,in simplified block diagram form.

FIG. 4A shows an example of a hardware configuration for the wearabledevice or the mobile device of the neural network system of FIGS. 3A-B,which includes the movement tracker to identify finger gestures based ona neural network model.

FIG. 4B shows an example of a hardware configuration for the wearabledevice or the mobile device of the neural network system of FIGS. 3C-D,which includes the movement tracker to detect wear of the wearabledevice or carrying of the mobile device based on a neural network model.

FIG. 4C shows an example of a hardware configuration for the wearabledevice or the mobile device of the neural network system of FIGS. 3E-F,which includes the movement tracker to identify activities based on aneural network model.

FIG. 5 is a flowchart of a method that can be implemented in the neuralnetwork system 300 of FIGS. 3B, 3D, and 3F to optimize execution speedand efficiency of the generated neural network model for gesture, wear,carry, or activity identification.

FIGS. 6A, 6B, and 6C illustrate press and hold detected touch events onthe input surface of the eyewear device.

FIG. 7 illustrates finger pinching and unpinching detected touch eventson the input surface of the eyewear device.

FIG. 8 illustrates finger rotation detected touch events on the inputsurface of the eyewear device.

FIG. 9 illustrates finger swiping detected touch events on the inputsurface of the eyewear device.

DETAILED DESCRIPTION

In the following detailed description, numerous specific details are setforth by way of examples in order to provide a thorough understanding ofthe relevant teachings. However, it should be apparent to those skilledin the art that the present teachings may be practiced without suchdetails. In other instances, description of well-known methods,procedures, components, and circuitry are set forth at a relativelyhigh-level, without detail, in order to avoid unnecessarily obscuringaspects of the present teachings.

The term “coupled” or “connected” as used herein refers to any logical,optical, physical or electrical connection, link or the like by whichelectrical or magnetic signals produced or supplied by one systemelement are imparted to another coupled or connected element. Unlessdescribed otherwise, coupled or connected elements or devices are notnecessarily directly connected to one another and may be separated byintermediate components, elements or communication media that maymodify, manipulate or carry the electrical signals. The term “on” meansdirectly supported by an element or indirectly supported by the elementthrough another element integrated into or supported by the element.

The orientations of the eyewear device, associated components and anycomplete devices incorporating a movement tracker such as shown in anyof the drawings, are given by way of example only, for illustration anddiscussion purposes. In operation for gesture, wear, or activitydetection the eyewear device may be oriented in any other directionsuitable to the particular application of the eyewear device, forexample up, down, sideways, or any other orientation. Also, to theextent used herein, any directional term, such as front, rear, inwards,outwards, towards, left, right, lateral, longitudinal, up, down, upper,lower, top, bottom, side, horizontal, vertical, and diagonal are used byway of example only, and are not limiting as to direction or orientationof any movement tracker or component of the movement tracker constructedas otherwise described herein.

Additional objects, advantages and novel features of the examples willbe set forth in part in the following description, and in part willbecome apparent to those skilled in the art upon examination of thefollowing and the accompanying drawings or may be learned by productionor operation of the examples. The objects and advantages of the presentsubject matter may be realized and attained by means of themethodologies, instrumentalities and combinations particularly pointedout in the appended claims.

Reference now is made in detail to the examples illustrated in theaccompanying drawings and discussed below.

FIG. 1A is a right side view of an example hardware configuration of aneyewear device 100, which includes a movement tracker (not shown),utilized in a neural network system for gesture, wear, and activitydetection. Eyewear device 100, includes a right optical assembly 180Bwith an image display to present images. As shown in FIG. 1A, theeyewear device 100 includes the right visible light camera 114B. Eyeweardevice 100 can include multiple visible light cameras 114A-B that form apassive type of depth-capturing camera, such as a stereo camera, ofwhich the right visible light camera 114B is located on a right chunk110B. The eyewear device 100 can also include a left visible lightcamera 114A on a left chunk 110A. Depth-capturing camera can be anactive type of depth-capturing camera that includes a single visiblelight camera 114B and a depth sensor (e.g., an infrared camera and aninfrared emitter).

Left and right visible light cameras 114A-B are sensitive to the visiblelight range wavelength. Each of the visible light cameras 114A-B have adifferent frontward facing field of view which are overlapping to allowthree-dimensional depth images to be generated, for example, rightvisible light camera 114B has the depicted right field of view 111B.Generally, a “field of view” is the part of the scene that is visiblethrough the camera at a particular position and orientation in space.Objects or object features outside the field of view 111A-B when theimage is captured by the visible light camera are not recorded in a rawimage (e.g., photograph or picture). The field of view describes anangle range or extent which the image sensor of the visible light camera114A-B picks up electromagnetic radiation of a given scene in a capturedimage of the given scene. Field of view can be expressed as the angularsize of the view cone, i.e., an angle of view. The angle of view can bemeasured horizontally, vertically, or diagonally.

In an example, visible light cameras 114A-B have a field of view with anangle of view between 15° to 30°, for example 24°, and have a resolutionof 480×480 pixels. The “angle of coverage” describes the angle rangethat a lens of visible light cameras 114A-B or infrared camera caneffectively image. Typically, the image circle produced by a camera lensis large enough to cover the film or sensor completely, possiblyincluding some vignetting toward the edge. If the angle of coverage ofthe camera lens does not fill the sensor, the image circle will bevisible, typically with strong vignetting toward the edge, and theeffective angle of view will be limited to the angle of coverage.

Examples of such visible lights camera 114A-B include a high-resolutioncomplementary metal-oxide-semiconductor (CMOS) image sensor and a videographic array (VGA) camera, such as 640p (e.g., 640×480 pixels for atotal of 0.3 megapixels), 720p, or 1080p. As used herein, the term“overlapping” when referring to field of view means the matrix of pixelsin the generated raw image(s) or infrared image of a scene overlap by30% or more. As used herein, the term “substantially overlapping” whenreferring to field of view means the matrix of pixels in the generatedraw image(s) or infrared image of a scene overlap by 50% or more.

Image sensor data from the visible light cameras 114A-B are capturedalong with geolocation data, digitized by an image processor, and storedin a memory. The captured left and right raw images captured byrespective visible light cameras 114A-B are in the two-dimensional spacedomain and comprise a matrix of pixels on a two-dimensional coordinatesystem that includes an X axis for horizontal position and a Y axis forvertical position. Each pixel includes a color attribute (e.g., a redpixel light value, a green pixel light value, and/or a blue pixel lightvalue); and a position attribute (e.g., an X location coordinate and a Ylocation coordinate).

To provide stereoscopic vision, visible light cameras 114A-B may becoupled to an image processor (element 312 of FIG. 3A) for digitalprocessing along with a timestamp in which the image of the scene iscaptured. Image processor 312 includes circuitry to receive signals fromthe visible light cameras 114A-B and process those signals from thevisible light camera 114 into a format suitable for storage in thememory. The timestamp can be added by the image processor or otherprocessor, which controls operation of the visible light cameras 114A-B.Visible light cameras 114A-B allow the depth-capturing camera tosimulate human binocular vision. Depth-capturing camera provides theability to reproduce three-dimensional images based on two capturedimages from the visible light cameras 114A-B having the same timestamp.Such three-dimensional images allow for an immersive life-likeexperience, e.g., for virtual reality or video gaming.

For stereoscopic vision, a pair of raw red, green, and blue (RGB) imagesare captured of a scene at a given moment in time—one image for each ofthe left and right visible light cameras 114A-B. When the pair ofcaptured raw images from the frontward facing left and right field ofviews 111A-B of the left and right visible light cameras 114A-B areprocessed (e.g., by the image processor 312 of FIG. 3A), depth imagesare generated, and the generated depth images can be perceived by a useron the optical assembly 180A-B or other image display(s) (e.g., of amobile device). The generated depth images are in the three-dimensionalspace domain and can comprise a matrix of vertices on athree-dimensional location coordinate system that includes an X axis forhorizontal position (e.g., length), a Y axis for vertical position(e.g., height), and a Z axis for depth (e.g., distance). Each vertexincludes a position attribute (e.g., a red pixel light value, a greenpixel light value, and/or a blue pixel light value); a positionattribute (e.g., an X location coordinate, a Y location coordinate, anda Z location coordinate); a texture attribute, and/or a reflectanceattribute. The texture attribute quantifies the perceived texture of thedepth image, such as the spatial arrangement of color or intensities ina region of vertices of the depth image.

FIGS. 1B-C are rear views of example hardware configurations of theeyewear device 100, including two different types of image displays.Eyewear device 100 is in a form configured for wearing by a user, whichare eyeglasses in the example. The eyewear device 100 can take otherforms and may incorporate other types of frameworks, for example, aheadgear, a headset, or a helmet.

In the eyeglasses example, eyewear device 100 includes a frame 105including a left rim 107A connected to a right rim 107B via a bridge 106adapted for a nose of the user. The left and right rims 107A-B includerespective apertures 175A-B which hold a respective optical element180A-B, such as a lens and a display device. As used herein, the termlens is meant to cover transparent or translucent pieces of glass orplastic having curved and/or flat surfaces that cause light toconverge/diverge or that cause little or no convergence or divergence.

Although shown as having two optical elements 180A-B, the eyewear device100 can include other arrangements, such as a single optical element ormay not include any optical element 180A-B depending on the applicationor intended user of the eyewear device 100. As further shown, eyeweardevice 100 includes a left chunk 110A adjacent the left lateral side170A of the frame 105 and a right chunk 110B adjacent the right lateralside 170B of the frame 105. The chunks 110A-B may be integrated into theframe 105 on the respective lateral sides 170A-B (as illustrated) orimplemented as separate components attached to the frame 105 on therespective sides 170A-B. Alternatively, the chunks 110A-B may beintegrated into temples (not shown) attached to the frame 105.

In one example, the image display of optical assembly 180A-B includes anintegrated image display. As shown in FIG. 1B, the optical assembly180A-B includes a suitable display matrix 170 of any suitable type, suchas a liquid crystal display (LCD), an organic light-emitting diode(OLED) display, or any other such display. The optical assembly 180A-Balso includes an optical layer or layers 176, which can include lenses,optical coatings, prisms, mirrors, waveguides, optical strips, and otheroptical components in any combination. The optical layers 176A-N caninclude a prism having a suitable size and configuration and including afirst surface for receiving light from display matrix and a secondsurface for emitting light to the eye of the user. The prism of theoptical layers 176A-N extends over all or at least a portion of therespective apertures 175A-B formed in the left and right rims 107A-B topermit the user to see the second surface of the prism when the eye ofthe user is viewing through the corresponding left and right rims107A-B. The first surface of the prism of the optical layers 176A-Nfaces upwardly from the frame 105 and the display matrix overlies theprism so that photons and light emitted by the display matrix impingethe first surface. The prism is sized and shaped so that the light isrefracted within the prism and is directed towards the eye of the userby the second surface of the prism of the optical layers 176A-N. In thisregard, the second surface of the prism of the optical layers 176A-N canbe convex to direct the light towards the center of the eye. The prismcan optionally be sized and shaped to magnify the image projected by thedisplay matrix 170, and the light travels through the prism so that theimage viewed from the second surface is larger in one or more dimensionsthan the image emitted from the display matrix 170.

In another example, the image display device of optical assembly 180A-Bincludes a projection image display as shown in FIG. 1C. The opticalassembly 180A-B includes a laser projector 150, which is a three-colorlaser projector using a scanning mirror or galvanometer. Duringoperation, an optical source such as a laser projector 150 is disposedin or on one of the temples 125A-B of the eyewear device 100. Opticalassembly 180A-B includes one or more optical strips 155A-N spaced apartacross the width of the lens of the optical assembly 180A-B or across adepth of the lens between the front surface and the rear surface of thelens.

As the photons projected by the laser projector 150 travel across thelens of the optical assembly 180A-B, the photons encounter the opticalstrips 155A-N. When a particular photon encounters a particular opticalstrip, the photon is either redirected towards the user's eye, or itpasses to the next optical strip. A combination of modulation of laserprojector 150, and modulation of optical strips, may control specificphotons or beams of light. In an example, a processor controls opticalstrips 155A-N by initiating mechanical, acoustic, or electromagneticsignals. Although shown as having two optical assemblies 180A-B, theeyewear device 100 can include other arrangements, such as a single orthree optical assemblies, or the optical assembly 180A-B may havearranged different arrangement depending on the application or intendeduser of the eyewear device 100.

As further shown in FIGS. 1B-C, eyewear device 100 includes a left chunk110A adjacent the left lateral side 170A of the frame 105 and a rightchunk 110B adjacent the right lateral side 170B of the frame 105. Thechunks 110A-B may be integrated into the frame 105 on the respectivelateral sides 170A-B (as illustrated) or implemented as separatecomponents attached to the frame 105 on the respective sides 170A-B.Alternatively, the chunks 110A-B may be integrated into temples 125A-Battached to the frame 105. As used herein, the chunks 110A-B can includean enclosure that encloses a collection of processing units, camera,sensors, etc. (e.g., different for the right and left side) that areencompassed in an enclosure.

In one example, the image display includes a first (left) image displayand a second (right) image display. Eyewear device 100 includes firstand second apertures 175A-B which hold a respective first and secondoptical assembly 180A-B. The first optical assembly 180A includes thefirst image display (e.g., a display matrix 170A of FIG. 1B; or opticalstrips 155A-N′ and a projector 150A of FIG. 1C). The second opticalassembly 180B includes the second image display e.g., a display matrix170B of FIG. 1B; or optical strips 155A-N″ and a projector 150B of FIG.1C).

FIG. 1D depicts a schematic view of operation of a movement tracker ofthe eyewear device 100 connected to an input surface 181 to trackmovement of the eyewear device 100 from at least one finger contact 179inputted from a user. The input surface 181 is formed of plastic,acetate, or another insulating material that forms a substrate of theframe 105, the temples 125A-B, or the lateral sides 170A-B.

While touch screens exist for mobile devices, such as tablets andsmartphones, utilization of a touch screen in the lens of the eyeweardevice 100 can interfere with the line of sight of the user of theeyewear device 100 and hinder the user's view. For example, fingertouches can smudge the optical assembly 180-B (e.g., optical layers,image display, and lens) and cloud or obstruct the user's vision. Toavoid creating blurriness and poor clarity when the user's eyes lookthrough the transparent portion of the optical assembly 180A-B, changesin rotational acceleration, motion, spatial orientation, and othermeasurement features collected by the movement tracker 118 resultingfrom finger contact on an input surface 181 can be utilized to detectfinger touch gestures. Touch gestures are inputs to the human-machineinterface of the eyewear device 100 to perform specific actions inapplications executing on the eyewear device 100 or to navigate throughdisplayed images in an intuitive manner, which enhances and simplifiesthe user experience.

When utilized in the neural network system, such a movement tracker 118(which can be an inertial measurement unit, accelerometer, or gyroscope)that is already incorporated into the eyewear device 100 can save bothcosts of additional circuitry of a touch sensor (e.g., capacitive orresistive type touch sensors). Using measurements taken by the movementtracker 118 in a neural network model eliminates the additionallyrequired space of incorporating a touch sensor circuit in the eyeweardevice 100, which reduces the form factor of the eyewear device 100. Inaddition, the savings here include cost. The system of neural networkdetection can be used with a touch sensor and movement tracker 118(e.g., IMU). The benefit of neural networks in this context is theirability to recognize complex gestures and activities that are otherwiseimpossible to detect with handwritten heuristics based code.

Detection of finger gestures via the neural network model that usesmeasurement taken by the movement tracker 118 as a model input layer canenable several functions. For example, touching anywhere on the inputsurface 181 may highlight an item on the screen of the image display ofthe optical assembly 180A-B. Double tapping on the input surface 181 mayselect an item. Sliding (e.g., or swiping) a finger from front to backmay slide or scroll in one direction, for example, to move to a previousvideo, image, page, or slide. Sliding the finger from back to front mayslide or scroll in the opposite direction, for example, to move to aprevious video, image, page, or slide. Pinching with two fingers mayprovide a zoom-in function to zoom in on content of a displayed image.Unpinching with two fingers provides a zoom-out function to zoom out ofcontent of a displayed image. The input surface 181 can be virtuallyanywhere on the eyewear device 100. To detect finger sliding andpinching and unpinching a touch sensor incorporated in the eyeweardevice 100 may also be used with the movement tracker 118.

In one example, when the identified finger gesture is single tap on theinput surface 181, this initiates selection or pressing of a graphicaluser interface element in the image presented on the image display ofthe optical assembly 180A-B. An adjustment to the image presented on theimage display of the optical assembly 180A-B based on the identifiedfinger gesture can be a primary action which selects or submits thegraphical user interface element on the image display of the opticalassembly 180A-B for further display or execution. This is just oneexample of a supported finger gesture, and it should be understood thatseveral finger gesture types are supported by the eyewear device 100,which can include single or multiple finger contacts. Examples ofmultiple finger contact detected touch events and identified fingergestures are provided in FIGS. 6-9 . Moreover, in some examples, thegesture may control other output components, such as a speakers of theeyewear device 100, e.g., controlling volume, for example.

Eyewear device 100 may include wireless network transceivers, forexample cellular or local area network transceivers (e.g., WiFi orBluetooth™), and run sophisticated applications. Some of theapplications may include a web browser to navigate the Internet, anapplication to place phone calls, video or image codecs to watch videosor interact with pictures, codecs to listen to music, a turn-by-turnnavigation application (e.g., to enter in a destination address and viewmaps), an augmented reality application, and an email application (e.g.,to read and compose emails). Gestures inputted on the input surface 181can be used to manipulate and interact with the displayed content on theimage display and control the applications.

In an example, a neural network system includes the eyewear device 100.The eyewear device 100 includes a frame 105 and a left temple 125Aextending from a left lateral side 170A of the frame 105 and a righttemple 125B extending from a right lateral side 170B of the frame 105.Eyewear device 100 further includes an input surface 181 on the frame105, the temples 110A-B, the lateral sides 170A-B, or a combinationthereof. Eyewear device 100 further includes an image display to presentan image to a user and an image display driver coupled to the imagedisplay to control the image presented to the user. Eyewear device 100further includes a movement tracker 118 connected to the input surface181 to track movement of the eyewear device 100 from at least one fingercontact 179 inputted from a user. The movement tracker 118 includes: (i)at least one accelerometer to measure acceleration, (ii) at least onegyroscope to measure rotation, or (iii) an inertial measurement unit(IMU) having the at least one accelerometer and the at least onegyroscope.

Inertial measurement unit is an electronic device that measures andreports a body's specific force, angular rate, and sometimes themagnetic field surrounding the body, using a combination ofaccelerometers and gyroscopes, sometimes also magnetometers. If amagnetometer is present, the magnetic field can be used as input to theneural network to detect specific gestures that are dependent on Earth'sor an artificial magnetic field. In this example, the inertialmeasurement unit determines a rotation acceleration of the eyeweardevice 100. The inertial measurement unit 219 works by detecting linearacceleration using one or more accelerometers and rotational rate usingone or more gyroscopes. Typical configurations of inertial measurementunits contain one accelerometer, gyroscope, and magnetometer per axisfor each of the three axes: horizontal axis for left-right movement (X),vertical axis (Y) for top-bottom movement, and depth or distance axisfor up-down movement (Z). The gyroscope detects the rate of rotationaround 3 axes (X, Y, and Z). The magnetometer detects the magnetic field(e.g., facing south, north, etc.) like a compass which generates aheading reference, which is a mixture of Earth's magnetic field andother artificial magnetic field (such as ones generated by power lines).The three accelerometers detect acceleration along the horizontal (X),vertical (Y), and depth or distance (Z) axes defined above, which can bedefined relative to the ground, the eyewear device 100, thedepth-capturing camera, or the user wearing the eyewear device 100.Thus, the accelerometer detects a 3 axis acceleration vector, which thencan be used to detect Earth's gravity vector

Neural network system further includes a memory (element 334 of FIG.3A), for example, in the eyewear device 100 itself or other componentsof the neural network system. Neural network system further includes aprocessor (element 332 of FIG. 3A) coupled to the image display driver(element 342 of FIG. 3A), the movement tracker 118, and the memory(element 334 of FIG. 3 ), for example, in the eyewear device 100 itselfor other components of the neural network system.

Neural network system further includes programming (element 345 of FIG.3A) in the memory and execution of the programming (element 345 of FIG.3A) by the processor (element 343 of FIG. 3A) configures the eyeweardevice 100 to track, via the movement tracker 118, movement of theeyewear device 100 from the at least one finger contact 179 inputtedfrom the user on the input surface 181. Tracking, via the movementtracker 118, includes: (i) measuring, via the at least oneaccelerometer, the acceleration of the eyewear device 100, (ii)measuring, via the at least one gyroscope, the rotation of the eyeweardevice 100, or (iii) measuring, via the inertial measurement unit, boththe acceleration and the rotation of the eyewear device 100. Executionof the programming (element 345 of FIG. 3A) by the processor (element332 of FIG. 3A) further configures the eyewear device 100 to identify afinger gesture on the input surface 181 of the eyewear device 100 bydetecting at least one detected touch event based on variation of thetracked movement of the eyewear device 100 over a time period. Executionof the programming (element 345 of FIG. 3A) by the processor (element332 of FIG. 3A) further configures the eyewear device 100 to adjust theimage presented on the image display of the optical assembly 180A-B ofthe eyewear device 100 based on the identified finger gesture.

Other arrangements of the movement tracker 118 and input surface 181 canbe implemented. In some arrangements, the input surface is on the leftrim 107A or right rim 107B, or in different locations on the frame 105or one or both of the chunks 110A-B or lateral sides 170A-B. Themovement tracker 118 can be connected essentially anywhere on the frame105, left chunk 110A, right chunk 110B, temples 125A-B, etc. to trackmovement 186 of the eyewear device 100, for example, resulting fromfinger contact 179.

FIG. 2 is a top cross-sectional view of a left chunk 110A of the eyeweardevice 100 of FIGS. 1B and 1C depicting the movement tracker 118 and acircuit board 140A. In the example, the left chunk 110A includes aflexible printed circuit board 140A that includes the movement tracker118. The left chunk 110A is integrated into or connected to the frame105 on the left lateral side 170A. In some examples, the right chunk110B may include the movement tracker 118 in a similar construction.

As described in FIGS. 3A-F and 4A-C below, the neural network systemimplemented in the gesture, wear, carry, or activity detectionprogramming is made up of artificial neurons that have learnable weightsand biases. These convolutional neural network (CNN) models are builtvia a host computer, such as the server system 398. The host computermay be a personal computer, embedded high speed processor, GPU, FPGA orany other system that performs neural network training. Server system398 may be one or more computing devices as part of a service or networkcomputing system, for example, that include a processor, a memory, andnetwork communication interface to communicate over the network 395 withthe mobile device 390 and eyewear device 100. It is not required for theserver system 398 to communicate with the mobile device 399 or theeyewear device 100. The computing system that trains the neural networkcan be standalone and not specifically connected to any network. To havea high confidence level that the tracked movement over time period 360is a recognized gesture or activity or that the device is being worn orcarried, many (e.g., hundreds or thousands) of features, such asmeasurements 361A-N, 364A-N taken by the movement tracker 118 areinputted as the model input layer to the neural network model. However,to improve speed, memory footprint (both RAM and disk/flash storage),and efficiency, the gesture detection programming 345, wear/carrydetection programming 386, or activity detection programming 302 maydecide to short-circuit the procedure when, for example, 5-10 salientmeasurement features are matched instead of all one-hundred orone-thousand measurement features of the recognized gesture, activity,or wear/carry neural network models. Moreover, having a fast and lowmemory footprint neural network forward pass system enables the gesturedetection programming 345, wear/carry detection programming 386, oractivity detection programming 302 to execute and run on low powercomputational system 320, such as the one included in the eyewear device100. Specifically, the trained neural network (forward pass) runs on thelow power circuitry. Hence, although shown in the memory 334 of thehigh-speed circuitry 330, the neural network, such as gesture detectionprogramming 345, wear/carry detection programming 386, or activitydetection programming 302, can be stored and executed in the low powerprocessor 323. The training itself is done offline as a one-time task ona server or computer, but really can be done on any high speed computingplatform.

The basic unit of computation in a neural network is the neuron, oftencalled a node or unit. The neuron receives input from some other nodes,or from an external source and computes an output. Each input has anassociated weight (w), which is assigned on the basis of its relativeimportance to other inputs. The input layer provides information fromthe outside world to the neural network and consists of input nodes(e.g., tracked movement over time period 360 by the movement tracker118). The hidden layer 348 (e.g., touch events 349A-N) has no directconnection with the outside world, but performs computations andtransfers information from the input nodes to the output nodes. Outputlayer is responsible for computations and transferring information fromthe neural network to the outside world (e.g., identified fingergestures 369A-N and confidence levels 371A-N). The output layer isresponsible for the final, classified network outputs. Classifiedoutputs identify the gesture, such as a double tap. Further algorithmsprocess this result into an action.

The neuron applies a function (f) to the weighted sum of its inputs.Each neuron receives some inputs, performs a dot product and optionallyfollows it with a non-linearity. The non-linearity here is the functionf. Output=f(sum(input(i)*weight(i))). Hence, the neural network isformed in three layers, called the input layer, hidden layer, and outputlayer. The function f is non-linear and is called the activationfunction. The purpose of the activation function is to introducenon-linearity into the output of a neuron. This is important becausemost real world data is non-linear and this allows the neurons to learnfrom these non-linear representations.

Every activation function (or non-linearity) takes a single number andperforms a certain fixed mathematical operation on it. There are severalactivation functions encountered in practice: (i) sigmoid: takes areal-valued input and squashes it to range between 0 and 1; (ii) tanh:takes a real-valued input and squashes it to the range [−1, 1]; and(iii) ReLU (stands for Rectified Linear Unit), which takes a real-valuedinput and thresholds it at zero (replaces negative values with zero).

FIG. 3A is a high-level functional block diagram of an example neuralnetwork system 300 including the eyewear device 100 with the movementtracker 118 to identify finger gestures based on a neural network model(gesture detection programming 345), a mobile device 390, and a serversystem 398 connected via various networks. Eyewear device 100 isconnected with a host computer. For example, the eyewear device 100 ispaired with the mobile device 390 via the high-speed wireless connection337 or connected to the server system 398 via the network 395.

Neural network system 300 includes a wearable device 399, which is theeyewear device 100 in the example of FIG. 3A. The wearable device 399can also be a watch as shown in FIG. 3B, wristband, or other portabledevice designed to be worn by a user to communicate via one or morewireless networks or wireless links with mobile device 390 or serversystem 398. Mobile device 390 may be a smartphone, tablet, laptopcomputer, access point, or any other such device capable of connectingwith eyewear device 100 using both a low-power wireless connection 325and a high-speed wireless connection 337. Mobile device 390 is connectedto server system 398 and network 395. The network 395 may include anycombination of wired and wireless connections.

Eyewear device 100 includes the movement tracker 118 and adepth-capturing camera, such as at least one of the visible lightcameras 114A-B; and a depth sensor (not shown, but comprising aninfrared emitter and an infrared camera). Eyewear device 100 furtherincludes two image displays of the optical assembly 180A-B (oneassociated with the left lateral side 170A and one associated with theright lateral side 170B). Eyewear device 100 also includes image displaydriver 342, image processor 312, low-power circuitry 320, and high-speedcircuitry 330. Image display of optical assembly 180A-B is forpresenting images and videos, which can include a sequence of images.Image display driver 342 is coupled to the image display of opticalassembly 180A-B to control the image display of optical assembly 180A-Bto present the images. The components shown in FIGS. 3A, 3C, and 3E forthe eyewear device 100 are located on one or more circuit boards, forexample a PCB or flexible PCB, in the rims 107A-B or temples 125A-B.Alternatively or additionally, the depicted components can be located inthe chunks 110A-B, frame 105, hinges 226A-B, or bridge 106 of theeyewear device 100.

Generally, the neural network is pre-trained with labeled data set, thenon the eyewear device 100, the neural network is executed through aforward-pass mechanism where the inputs 359A-N are presented and thetrained weights are used to calculate the outputs 368A-N. The outputsrepresent the probabilities of each class of gesture or activity to bedetected.

In the neural network system 300, eyewear device 100 includes a gesturemodel input layer 359A-N, which is tracked movement over time period 360for the eyewear device 100. Tracked movement over time period 360includes accelerometer measurements 361A-N, which includes measuredacceleration (MA) 362A-N and measured acceleration time coordinates363A-N to indicate when the measured acceleration 362A-N was taken.Tracked movement over time period 360 further includes gyroscopemeasurements 364A-N, which includes measured rotation (MR) 365A-N,measured rotation time coordinates 366A-N to indicate when the measuredrotation 365A-N was taken, and motion interrupt time coordinates 367A-N(e.g., times when motion is detected).

As shown, memory 334 further includes gesture detection programming 345to perform a subset or all of the functions described herein for gesturedetection. Although the neural network model can include an input layer,hidden layers and output layer, in the example the neural network modelof the gesture detection programming 345 includes convolutional layers(several), fully connected layers (these used to be hidden layers) and asingle output layer. Gesture detection programming 345 has a trainedmachine code gesture model 346, a set of gesture weights 347A-N, andhidden layers 348 (which includes touch events 349A-N). Memory 334further includes a gesture model output layer 368A-N. Gesture modeloutput layer 368A-N has identified finger gestures 369A-N, confidencelevels 371A-N for the identified finger gestures 369A-N, and arecognized gesture 372 that determines the most likely gesture of theidentified finger gestures 369A-N.

High-speed circuitry 330 includes high-speed processor 332, memory 334,and high-speed wireless circuitry 336. In the example, the image displaydriver 342 is coupled to the high-speed circuitry 330 and operated bythe high-speed processor 332 in order to drive the left and right imagedisplays of the optical assembly 180A-B. High-speed processor 332 may beany processor capable of managing high-speed communications andoperation of any general computing system needed for eyewear device 100.High-speed processor 332 includes processing resources needed formanaging high-speed data transfers on high-speed wireless connection 337to a wireless local area network (WLAN) using high-speed wirelesscircuitry 336. In certain embodiments, the high-speed processor 332executes firmware that includes the gesture detection programming 345and an operating system, such as a LINUX operating system or other suchoperating system of the eyewear device 100 and the operating system isstored in memory 334 for execution. In addition to any otherresponsibilities, the high-speed processor 332 executing a softwarearchitecture for the eyewear device 100 is used to manage data transferswith high-speed wireless circuitry 336. In certain embodiments,high-speed wireless circuitry 336 is configured to implement Instituteof Electrical and Electronic Engineers (IEEE) 802.11 communicationstandards, also referred to herein as Wi-Fi. In other embodiments, otherhigh-speed communications standards may be implemented by high-speedwireless circuitry 336.

Low-power wireless circuitry 324 and the high-speed wireless circuitry336 of the eyewear device 100 can include short range transceivers(Bluetooth™) and wireless wide, local, or wide area network transceivers(e.g., cellular or WiFi). Mobile device 390, including the transceiverscommunicating via the low-power wireless connection 325 and high-speedwireless connection 337, may be implemented using details of thearchitecture of the eyewear device 100, as can other elements of network395.

Memory 334 includes any storage device capable of storing various dataand applications, including, among other things, gesture model inputlayer 359A-N, gesture detection programming 345, gesture model outputlayer 368A-N, camera data generated by the left and right visible lightcameras 114A-B, and the image processor 312, as well as images andvideos generated for display by the image display driver 342 on theimage displays of the optical assembly 180A-B. While memory 334 is shownas integrated with high-speed circuitry 330, in other embodiments,memory 334 may be an independent standalone element of the eyeweardevice 100. In certain such embodiments, electrical routing lines mayprovide a connection through a chip that includes the high-speedprocessor 332 from the image processor 312 or low-power processor 322 tothe memory 334. In other embodiments, the high-speed processor 332 maymanage addressing of memory 334 such that the low-power processor 322will boot the high-speed processor 332 any time that a read or writeoperation involving memory 334 is needed.

As shown in FIG. 3A, the processor 332 of the eyewear device 100 can becoupled to the movement tracker 118, the depth-capturing camera (visiblelight cameras 114A-B; or visible light camera 114A and a depth sensor,which is not shown), the image display driver 342, and the memory 334.Eyewear device 100 can perform all or a subset of any of the followingfunctions described below as a result of the execution of the gesturedetection programming 345 in the memory 334 by the processor 332 of theeyewear device 100. As shown in FIG. 4A, mobile device 390 can performall or a subset of any of the following functions described below as aresult of the execution of the gesture detection programming 345 in thememory 440A by the processor 430 of the mobile device 390.

In the example of FIG. 3A, execution of the gesture detectionprogramming 345 by the processor 332 configures the eyewear device 100to perform functions, including functions to track, via the movementtracker 118, movement of the eyewear device 100 from the at least onefinger contact 179 inputted from the user on the input surface 181 by:(i) measuring, via the at least one accelerometer, the acceleration362A-N of the eyewear device 100, (ii) measuring, via the at least onegyroscope, the rotation 365A-N of the eyewear device 100, or (iii)measuring, via the inertial measurement unit, both the acceleration362A-N and the rotation 365A-N of the eyewear device 100. Eyewear device100 identifies a finger gesture 369A-N on the input surface 181 of theeyewear device by detecting at least one detected touch event 349A-Nbased on variation of the tracked movement of the eyewear device 100over a time period 360. Eyewear device 100 adjusts the image presentedon the image display 180A-B of the eyewear device 100 based on theidentified finger gesture 369A-N.

Movement tracker 118 is further configured to track over the time perioda respective time coordinate 363A-N, 366A-N, 367A-N for each takenmeasurement of: (i) the measured acceleration 362A-N via theaccelerometer, (ii) the measured rotation 365A-N via the least onegyroscope, or (iii) both the measured acceleration 362A-N and therotation 365A-N via the inertial measurement unit. Eyewear device 100tracks, via the movement tracker 118, the respective time coordinate363A-N, 366A-N, 367A-N for each taken measurement 362A-N, 365A-N. Thefunction to detect the at least one touch event 349A-N based on thevariation of the tracked movement over the time period 360 is furtherbased on the respective time coordinate 363A-N, 366A-N, 367A-N for eachtaken measurement 362A-N, 365A-N.

The function to identify the finger gesture on the input surface 181 ofthe eyewear device 100 by detecting the at least one detected touchevent 349A-N based on variation of the tracked movement 360 of theeyewear device 100 over the time period includes the followingfunctions. First, applying multiple model inputs 359A-N that includetaken measurements of: (i) the measured acceleration 362A-N via theaccelerometer, (ii) the measured rotation 365A-N via the least onegyroscope, or (iii) both the measured acceleration 362A-N and therotation 365A-N via the inertial measurement unit taken at samplingfrequency during the time period 360 to a recognized gesture model 346to determine similarity of the variation of the tracked movement 360 toa recognized gesture in the recognized gesture model 346. The recognizedgesture model 346 includes a set of gesture weights 347A-N based onacquired training data 376A-N of: (i) acceleration 378A-N, (ii) rotation381A-N or (iii) both the acceleration 378A-N and the rotation 381A-Nover one or more time intervals. Second, determining a model output368A-N that includes a respective confidence level 371A-N of eachrespective recognized gesture 369A-N based on the determined similarityof the variation of the tracked movement 360 to the respectiverecognized gesture 369A-N. The neural network is pre-trained withlabeled data set, then on the eyewear device 100, the neural network(e.g., gesture detection programming 345) is executed through aforward-pass mechanism (e.g., trained machine code gesture model 346)where the inputs 359A-N are presented and the trained weights (set ofgesture weights 347A-N) are used to calculate the outputs 368A-N. Theoutputs 368A-N represent the probabilities of each class of gesture tobe detected.

The function to apply the multiple model inputs 359A-N to determinesimilarity of the variation of the tracked movement 360 to therecognized gesture in the recognized gesture model 346 and determine themodel output 368A-N are embedded as firmware programming in the eyeweardevice 100. The function to identify the finger gesture on the inputsurface 181 of the eyewear device 100 by detecting the at least onedetected touch event 349A-N based on variation of the tracked movement360 of the eyewear device 100 over the time period further includesidentifying the recognized gesture 372 as the respective recognizedgesture 369A-N that has the respective confidence level 371A-N with ahighest value in the model output 368A-N. The function to identify therecognized gesture as the respective recognized gesture 369A-N that hasthe respective confidence level 369A-N with the highest value in themodel output 368A-N is application layer programming in the eyeweardevice 100.

Output components of the eyewear device 100 include visual components,such as the left and right image displays of optical assembly 180A-B asdescribed in FIGS. 1B-C (e.g., a display such as a liquid crystaldisplay (LCD), a plasma display panel (PDP), a light emitting diode(LED) display, a projector, or a waveguide). Left and right imagedisplays of optical assembly 180A-B can present images, such as in avideo. The image displays of the optical assembly 180A-B are driven bythe image display driver 342. The output components of the eyeweardevice 100 further include acoustic components (e.g., speakers), hapticcomponents (e.g., a vibratory motor), other signal generators, and soforth. The input components of the eyewear device 100, the mobile device390, and server system 398, may include alphanumeric input components(e.g., a keyboard, a touch screen configured to receive alphanumericinput, a photo-optical keyboard, or other alphanumeric inputcomponents), point-based input components (e.g., a mouse, a touchpad, atrackball, a joystick, a motion sensor, or other pointing instruments),tactile input components (e.g., a physical button, a touch screen thatprovides location and force of touches or touch gestures, or othertactile input components), audio input components (e.g., a microphone),and the like.

Eyewear device 100 may optionally include additional peripheral deviceelements. Such peripheral device elements may include biometric sensors,additional sensors, or display elements integrated with eyewear device100. For example, peripheral device elements may include any I/Ocomponents including output components, motion components, positioncomponents, or any other such elements described herein.

For example, the biometric components include components to detectexpressions (e.g., hand expressions, facial expressions, vocalexpressions, body gestures, or eye tracking), measure biosignals (e.g.,blood pressure, heart rate, body temperature, perspiration, or brainwaves), identify a person (e.g., voice identification, retinalidentification, facial identification, fingerprint identification, orelectroencephalogram based identification), and the like. The motioncomponents include acceleration sensor components (e.g., accelerometer),gravitation sensor components, rotation sensor components (e.g.,gyroscope), and so forth. The position components include locationsensor components to generate location coordinates (e.g., a GlobalPositioning System (GPS) receiver component), WiFi or Bluetooth™transceivers to generate positioning system coordinates, altitude sensorcomponents (e.g., altimeters or barometers that detect air pressure fromwhich altitude may be derived), orientation sensor components (e.g.,magnetometers), and the like. Such positioning system coordinates canalso be received over wireless connections 325 and 337 from the mobiledevice 390 via the low-power wireless circuitry 324 or high-speedwireless circuitry 336.

FIG. 3B shows an example of a hardware configuration for the serversystem 398 of the neural network system 300 of FIG. 3A to build a neuralnetwork model for identifying finger gestures (gesture detectionprogramming 345), in simplified block diagram form. As further shown inFIG. 3B, server system 398 may be one or more computing devices as partof a service or network computing system, for example, that include amemory 350, a processor 360, a network communication interface 361 tocommunicate over the network 395 with the mobile device 390 and awearable device 399, such as the eyewear device 100. The memory 350includes gesture training data (TD) 376A-N, which includes trackedmovement over time intervals for known but unclassified gestures 377A-N.Gesture training data 376A-N includes accelerometer training data (TD)378A-N. Accelerometer training data 378A-N has acceleration measurements379A-N and acceleration time coordinates 380A-N to indicate when theacceleration measurement 379A-N was taken. Gesture training data 376A-Nincludes gyroscope training data 381A-N. Gyroscope training data 381A-Nhas rotation measurements 382A-N and rotation time coordinates 383A-N toindicate when the rotation measurement 382A-N was taken. Gesturetraining data 376A-N also includes motion interrupt time coordinates384A-N (e.g., times when motion is detected).

Memory 350 also includes a gesture model generator, shown as gestureneural network programming 375. Memory 350 also includes gesturedetection programming 345 which is outputted in response to applying thegesture neural network programming 375 to the inputted gesture trainingdata 376A-N. As shown, the gesture detection programming 345 includes atrained machine code gesture model 346, a set of gesture weights 347A-N,and hidden layers 348, such as touch events 349A-N. The built gesturedetection programming 345 is loaded in the eyewear device 100 orwearable device 399 for gesture detection.

Execution of the gesture neural network programming 375 by the processor360 configures the server system 398 to perform some or all of thefunctions described herein before execution of the gesture detectionprogramming 345 by the processor 332 of the eyewear device 100. First,acquire the training data 376A-N of: (i) acceleration 378A-N, (ii)rotation 381A-N, or (iii) both the acceleration 376A-N and the rotation381A-N of the eyewear device 100 over one or more time intervals for theknown, but unclassified gesture 377A-N. Second, build the recognizedgesture model 346 of the unclassified gesture based on the acquiredtraining data 376A-N. The function to build the recognized gesture model346 includes functions to calibrate the set of gesture weights 347A-Nfrom the acquired training data 376A-N of the unclassified gesture; andstore the calibrated set of gesture weights 347A-N in the recognizedgesture model 346 in association with the recognized gesture.

Gesture neural network programming 375 generates a trained model basedon training data. Training data is user-labeled inputs and outputs thatare used to calculate the weights of the neural network. The scriptinglanguage here may be irrelevant, as this can be done in many languagesor in hardware. To improve runtime efficiency of the built gesturedetection programming 345, gesture neural network programming 375 mayimplement the following functions. Generate a trained scripting languagegesture model in an interpreted programming language based on theacquired training data 376A-N of the unclassified gesture 377A-N.Extract the calibrated set of gesture weights 347A-N from the trainedscripting language gesture model. Generate a trained inferencing codegesture model in a compiled programming language based on the trainedscripting language gesture model. Compile the trained inferencing codegesture model into a trained machine inferencing code gesture model 346.Quantize the extracted calibrated set of gesture weights 347A-N.

A runtime efficient version of the gesture detection programming 345 canbe stored in the memory 334 of the eyewear device 100 includes thetrained inferencing machine code gesture model 346 and the quantizedcalibrated set of gesture weights 347A-N. To further improve runtime,the gesture detection programming 345 only occupies statically allocatedmemory during runtime.

FIG. 3C is a high-level functional block diagram of an example neuralnetwork system 300 including the eyewear device 100 with the movementtracker 118 to detect wear based on a neural network model (weardetection programming 386), a mobile device 390, and a server system 398connected via various networks. Because the components of FIG. 3C werealready explained in detail in FIG. 3A, repetition of some of thedetails is avoided here.

In the neural network system 300, the memory 334 of the eyewear device100 includes a wear model input layer 385A-N, which is the trackedmovement over time period 360 as described in FIG. 3A. As shown, memory334 further includes wear detection programming 386. The wear detectionprogramming 386 has a trained machine code wear model 387 and a set ofwear weights 388A-N. Memory 334 further includes a wear model outputlayer 391. The wear model output layer 391 can be a single output valuethat is a wear detection confidence level 392, which is a probabilityvalue that ranges from zero to 1 of being worn. The not being wornconfidence level is 100 percent minus the wear detection confidencelevel. Code outside of the neural network thresholds this value,typically at 0.5 to come up with a binary classification.

Execution of the wear detection programming 386 by the processor 332configures the eyewear device 100 to perform the following functions.Track, via the movement tracker 118, movement of the eyewear device 100from the at least one finger contact 179 inputted from the user on theinput surface 181 by: (i) measuring, via the at least one accelerometer,the acceleration 362A-N of the eyewear device 100, (ii) measuring, viathe at least one gyroscope, the rotation 365A-N of the eyewear device100, or (iii) measuring, via the inertial measurement unit, both theacceleration 362A-N and the rotation 365A-N of the eyewear device 100.Detect whether the user is wearing the eyewear device 100 based onvariation of the tracked movement of the eyewear device over a timeperiod 360.

FIG. 3D shows an example of a hardware configuration for the serversystem 398 of the neural network system of FIG. 3B to build a neuralnetwork model for wear detection. Because the components of FIG. 3D werealready explained in detail in FIG. 3B, repetition of some of thedetails is avoided here. In addition to detecting wear of a wearabledevice 399, such as the eyewear device 100, the methodology describedherein can be used to enable carry detection of the mobile device 390(e.g., determine when the user is walking with the device in his/herhand, on their person, or in a backpack or purse). The wear detectionsystem can be expanded to activities such as distinguishing betweenstanding, walking, running with the device worn for example. It can beused to detect gaze or attention to a specific object (with the help ofthe magnetometer).

As further shown in FIG. 3D, memory 350 of server system 398 includeswear/carry training data (TD) 351A-N. Wear/carry training data (TD)351A-N has tracked movement over time intervals during wearing/carrying352A-N of the wearable device 399 (e.g., eyewear device 100) or themobile deice 390. Wear/carry training data 351A-N includes accelerometertraining data (TD) 378A-N, which has acceleration measurements 379A-Nand acceleration time coordinates 380A-N to indicate when theacceleration measurement 379A-N was taken. Wear training data 351A-Nincludes gyroscope training data 381A-N, which includes rotationmeasurements 382A-N and rotation time coordinates 383A-N to indicatewhen the rotation measurement 382A-N was taken. Wear training data351A-N also includes motion interrupt time coordinates 384A-N (e.g.,times when motion is detected).

Memory 350 also includes a wear/carry model generator, shown aswear/carry neural network programming 353. Memory 350 also includeswear/carry detection programming 386, which is outputted in response toapplying the wear/carry neural network programming 353 to the inputtedwear/carry training data 351A-N. As shown, the wear/carry detectionprogramming 386 includes a trained machine code wear/carry model 387 anda set of wear/carry weights 388A-N. As used herein, the “trained machinecode” is the inferencing code (e.g., written in C language and thencompiled into machine code to execute on the microcontroller) to run ona forward pass on a “model,” where the model includes a table of neuralnetwork weights. In this case the “model” is the set of wear/carryweights 388A-N. The “trained machine code model” (inferencing code) isthe trained machine code wear/carry model 387. The built wear/carrydetection programming 386 is loaded in the wearable device 399 (e.g.,eyewear device) for wear detection or mobile device 390 for carrydetection.

In the wear detection example, execution of the wear neural networkprogramming 353 by the processor 360 configures the server system 398 toperform some or all of the functions described herein before executionof the wear detection programming 386 by the processor 332 of theeyewear device 100. First, acquire the training data 351A-N of: (i)acceleration 378A-N, (ii) rotation 381A-N, or (iii) both theacceleration 378A-N and the rotation 381A-N of the eyewear device 100over one or more time intervals during wearing of the eyewear device 100and when the eyewear device 100 is not being worn. Second, build therecognized wear model 387 based on the acquired training data 351A-N.The function to build the recognized wear model 387 includes calibratingthe set of wear weights 388A-N from the acquired training data 351A-Nduring wearing of the eyewear device 100 and when the eyewear device 100is not being worn; and storing the calibrated set of wear weights 388A-N(e.g., trained set or optimized set) in the recognized wear model 387 inassociation with a classification of the eyewear device 100 as beingworn or unworn.

FIG. 3E is a high-level functional block diagram of an example neuralnetwork system including the eyewear device 100 with the movementtracker to identify activities based on a neural network model (activitydetection programming 302) during wearing of the eyewear device 100, amobile device 390, and a server system 398 connected via variousnetworks. Because the components of FIG. 3E were already explained indetail in FIG. 3A, repetition of some of the details is avoided here.

In the neural network system 300, eyewear device 100 includes anactivity model input layer 301A-N, which is tracked movement over timeperiod 360. Tracked movement over time period 360 includes accelerometermeasurements 361A-N, which has measured acceleration (MA) 362A-N andmeasured acceleration time coordinates 363A-N to indicate when themeasured acceleration 362A-N was taken. Tracked movement over timeperiod 360 further includes gyroscope measurements 364A-N, whichincludes measured rotation (MR) 365A-N, measured rotation timecoordinates 366A-N to indicate when the measured rotation 365A-N wastaken, and motion interrupt time coordinates 367A-N (e.g., times whenmotion is detected). Memory 334 further includes an activity modeloutput layer 306A-N, which includes identified activities 307A-N,confidence levels 308A-N for the identified activities 307A-N, and arecognized activity 309 that determines the most likely activity of theidentified activities 307A-N.

As shown, memory 334 further includes activity detection programming302. Activity detection programming 302 includes a trained machine codeactivity model 303, a set of activity weights 347A-N, and hidden layers389 (which includes wear event 305). Execution of the activity detectionprogramming 302 by the processor configures the eyewear device 100 toperform the following functions. In response to detecting that the useris wearing the eyewear device 100 as described in FIG. 3C, identify arecognized activity 309 of the user wearing the eyewear device 100 basedon the variation of the tracked movement over the time period 360.Adjust an image presented on the image display 180A-B of the eyeweardevice 100 based on a recognized activity-based adjustment of therecognized activity 309. The recognized activity -based adjustmentincludes launch, hide, or display (e.g., opening) of an application forthe user to interact with or utilize. The recognized activity-basedadjustment includes display of a menu of applications related to therecognized object for execution (e.g., a hint). The recognizedactivity-based adjustment includes control of a contextual notificationto enable, disable, or restrict features of an application. Therecognized activity-based adjustment includes enable or disable of asystem level feature (e.g., power on or power off device peripherals,such as cameras 114A-B). The recognized activity-based adjustment mayinclude a combination of the foregoing.

FIG. 3F shows an example of a hardware configuration for the serversystem of the neural network system 300 of FIG. 3E to build a neuralnetwork model for identifying activities during wearing of the eyeweardevice (activity detection programming 302), in simplified block diagramform. The memory 350 includes activity training data (TD) 310A-N, whichincludes tracked movement over time intervals for known but unclassifiedactivities 311A-N. Activity training data 310A-N includes accelerometertraining data (TD) 378A-N, which has acceleration measurements 379A-Nand acceleration time coordinates 380A-N to indicate when theacceleration measurement 379A-N was taken. Gesture training data 376A-Nincludes gyroscope training data 381A-N, which includes rotationmeasurements 382A-N and rotation time coordinates 383A-N to indicatewhen the rotation measurement 382A-N was taken. Gesture training data376A-N also includes motion interrupt time coordinates 384A-N (e.g.,times when motion is detected).

Memory 350 also includes an activity model generator, shown as activityneural network programming 312. Memory 350 also includes activitydetection programming 302 which is outputted in response to applying theactivity neural network programming 312 to the inputted activitytraining data 310A-N. As shown, the activity detection programming 302includes a trained machine code activity model 303, a set of activityweights 304A-N, and hidden layers 389, such as a wear event 305. Thebuilt activity detection programming 302 is loaded in the wearabledevice 399 (e.g., eyewear device 100) or the mobile device 390 foractivity detection.

Execution of the activity neural network programming 312 by theprocessor 360 configures the server system 398 to perform variousfunctions before execution of the activity detection programming 302 bythe processor 332 of the eyewear device 100. First, server system 398acquires the training data 310A-N of: (i) acceleration 378A-N, (ii)rotation 381A-N, or (iii) both the acceleration 378A-N and the rotation381A-N of the eyewear device 100 over one or more time intervals of theunclassified activity 311A-N. Second, server system 398 builds therecognized activity model 303 of the unclassified activity based on theacquired training data 310A-N. Building the recognized activity model303 includes calibrating the set of activity weights 304A-N of theunclassified activity from the acquired training data 310A-N of theunclassified activity 311A-N. Building the recognized activity model 303further includes storing the calibrated set of activity weights 304A-Nin the recognized activity model 303 in association with the recognizedactivity.

As described in FIGS. 3A-C, gesture detection programming 345,wear/carry detection programming 386, or activity detection programming302 of the eyewear device 100 is stored locally in a read-only memory(ROM), erasable programmable read-only memory (EPROM), or flash memoryof high-speed circuitry 330. A firmware layer of the gesture detectionprogramming 345, wear/carry detection programming 386, or activitydetection programming 302 returns a keyword corresponding to therecognized gesture or activity and a confidence level that the trackedmovement over time period 360 corresponds to the gesture or activity tothe application layer of the gesture detection programming 345,wear/carry detection programming 386, or activity detection programming302. The neural network returns a number of probabilities, one for eachactivity to be recognized. Typically choosing the max probability beyondsome threshold (typically 0.5) is how the final result (activity) isdetermined. Firmware resides below the operating system level and ismore efficient, which optimizes speed of execution by calling thehardware directly, for example. An application layer of the gesturedetection programming 345, wear/carry detection programming 386, oractivity detection programming 302 determines a recognized-gesture oractivity based adjustment to take depending on the recognized activityor gesture, and confidence level. Having the recognized-gesture oractivity based adjustment determination reside in the application layerallows dynamic changes to be made with updates distributed from theserver system 398 via the networks 395, 337.

In some examples in which runtime is not deemed important, to allow forpropagated updates to the gesture detection programming 345, wear/carrydetection programming 386, or activity detection programming 302,firmware is not utilized for gesture, wear/carry, or activity detectionand the entire logic of the resides in the application layer in volatiletype memory 334. This can enable updates, which are transmitted from theserver system 398 via the networks 395, 337. For example, the serversystem 398 receives, via the network communication interface 361,crowdsourced additional training data 376A-N, 351A-N, 359A-N from thesame type of wearable device 399 or mobile device 390 of a differentuser. Server system 398 updates the recognized object model 366 of therecognized object based on the crowdsourced additional training data376A-N, 351A-N, 310A-N from the same type of wearable device 399 (e.g.,eyewear device) or same mobile device 390, but from different users.Updating the gesture detection programming 345, wear/carry detectionprogramming 386, or activity detection programming 302 includesreapplying the model generator to rebuild the gesture detectionprogramming 345, wear/carry detection programming 386, or activitydetection programming 302. Re-training is done offline on a high speedcomputing device. The result is a set of calculated weights which can betransmitted to the eyewear to update the neural network. Typically, thisis called an over-the-air (OTA) update and can include other software.Another mechanism for updating is one that is done in the backgroundwithout user intervention. Server system 398 then sends, via the network395, gesture detection programming 345, wear/carry detection programming386, or activity detection programming 302 to the wearable device 399(e.g., eyewear device) or the mobile device 390

FIG. 4A shows an example of a hardware configuration for the wearabledevice 399 or the mobile device 390 of the neural network system ofFIGS. 3A-B, which includes the movement tracker 118 to identify fingergestures based on a neural network model (gesture detection programming345). As shown in FIG. 4A, the gesture detection programming 345implemented in the wearable device 399 or mobile device 390 for gesturedetection is identical to that shown in FIG. 3A for the eyewear device100. The gesture neural network programming 375 of FIG. 3B trains thegesture detection programming 345 for the wearable device 399 and themobile device 390 in the exact same manner as the eyewear device 100.However, because the gesture training data 376A-N and the gesture modelinput layer 359A-N varies on the type and value of taken measurements onthe movement tracker 118, as well as the form factor of the eyeweardevice 100, wearable device 399, and the mobile device 390, the set ofgesture weights 347A-N and trained machine code gesture model 346 mayvary device to device. As a result, the gesture detection programming345 may be device specific. In some examples, the IMU location andtransforms can be re-applied to another similar device and the sameneural network can be used. The accuracy will not be as good, but inthis case only a shorter/smaller re-training pass needs to be done.

FIG. 4B shows an example of a hardware configuration for the wearabledevice 399 or the mobile device 390 of the neural network system 300 ofFIGS. 3C-D, which includes the movement tracker 118 to detect wear ofthe wearable device 399 or carrying of the mobile device 390 based on aneural network model. As shown in FIG. 4B, the wear 386 detectionprogramming 345 implemented in the wearable device 399 for weardetection or the carry detection programming 386 implemented in themobile device 390 for carry detection is identical to that shown in FIG.3C for the eyewear device 100. The wear or carry neural networkprogramming 353 of FIG. 3D trains the wear detection programming 386 forthe wearable device 399 and the carry detection programming 386 for themobile device 390 in the exact same manner as the eyewear device 100.However, because the wear or carry training data 351A-N and the wear orcarry model input layer 385A-N varies on the type and value of takenmeasurements on the movement tracker 118, as well as the form factor ofthe eyewear device 100, wearable device 399, and the mobile device 390,the set of weary or carry weights 388A-N and trained machine code wearor carry model 387 will vary device to device. As a result, the wear orcarry detection programming 386 is inherently device specific.

FIG. 4C shows an example of a hardware configuration for the wearabledevice 399 or the mobile device 390 of the neural network system 300 ofFIGS. 3E-F, which includes the movement tracker 118 to identifyactivities based on a neural network model. As shown in FIG. 4C, theactivity detection programming 302 implemented in the wearable device399 or mobile device 390 for activity detection is identical to thatshown in FIG. 3E for the eyewear device 100. The activity neural networkprogramming 312 of FIG. 3F trains the activity detection programming 302for the wearable device 399 and the mobile device 390 in the exact samemanner as the eyewear device 100. However, because the activity trainingdata 310A-N and the activity model input layer 301A-N varies on the typeand value of taken measurements on the movement tracker 118, as well asthe form factor of the eyewear device 100, wearable device 399, and themobile device 390, the set of activity weights 304A-N and trainedmachine code activity model 303 will vary device to device. As a result,the activity detection programming 302 is inherently device specific.

As shown in FIGS. 4A-C, the wearable device 399 or the mobile device 390includes an image display 480 and an image display driver 490 to controlthe image display 480. In the example of FIG. 4A, the image display 480and a user input device 491 are integrated together into a touch screendisplay. Examples of touch screen type mobile devices that may be usedinclude (but are not limited to) a smart phone, a personal digitalassistant (PDA), a tablet computer, a laptop computer, or other portabledevice. However, the structure and operation of the touch screen typedevices is provided by way of example; and the subject technology asdescribed herein is not intended to be limited thereto. For purposes ofthis discussion, FIGS. 4A-C therefore provide block diagramillustrations of the example mobile device 390 and the wearable device399 having a touch screen display for displaying content and receivinguser input as (or as part of) the user interface.

The activities that are the focus of discussions here typically involvedata communications related to detecting finger gestures,wearing/carrying, or activities of a wearable device (e.g., eyeweardevice 100) or the mobile device 390. As shown in FIGS. 4A-C, the mobiledevice 390 and the wearable device 399 includes at least one digitaltransceiver (XCVR) 410, shown as WWAN XCVRs, for digital wirelesscommunications via a wide area wireless mobile communication network.The mobile device 390 and the wearable device 399 also includesadditional digital or analog transceivers, such as short range XCVRs 420for short-range network communication, such as via NFC, VLC, DECT,ZigBee, Bluetooth™, or WiFi. For example, short range XCVRs 420 may takethe form of any available two-way wireless local area network (WLAN)transceiver of a type that is compatible with one or more standardprotocols of communication implemented in wireless local area networks,such as one of the Wi-Fi standards under IEEE 802.11 and WiMAX.

To generate location coordinates for positioning of the mobile device390 and the wearable device 399, the mobile device 390 and the wearabledevice 399 can include a global positioning system (GPS) receiver.Alternatively, or additionally the mobile device 390 and the wearabledevice 399 can utilize either or both the short range XCVRs 420 and WWANXCVRs 410 for generating location coordinates for positioning. Forexample, cellular network, WiFi, or Bluetooth™ based positioning systemscan generate very accurate location coordinates, particularly when usedin combination. Such location coordinates can be transmitted to theeyewear device over one or more network connections via XCVRs 410, 420.

The transceivers 410, 420 (network communication interface) conforms toone or more of the various digital wireless communication standardsutilized by modern mobile networks. Examples of WWAN transceivers 410include (but are not limited to) transceivers configured to operate inaccordance with Code Division Multiple Access (CDMA) and 3rd GenerationPartnership Project (3GPP) network technologies including, for exampleand without limitation, 3GPP type 2 (or 3GPP2) and LTE, at timesreferred to as “4G.” For example, the transceivers 410, 420 providetwo-way wireless communication of information including digitized audiosignals, still image and video signals, web page information for displayas well as web related inputs, and various types of mobile messagecommunications to/from the mobile device 390 or the wearable device 399for the neural network system 300.

Several of these types of communications through the transceivers 410,420 and a network, as discussed previously, relate to protocols andprocedures in support of communications with the eyewear device 100 orthe server system 398 for detecting finger gestures, wearing, carrying,or activities. Such communications, for example, may transport packetdata via the short range XCVRs 420 over the wireless connections 325 and337 to and from the eyewear device 100 as shown in FIGS. 3A, 3C, and 3E.Such communications, for example, may also transport data utilizing IPpacket data transport via the WWAN XCVRs 410 over the network (e.g.,Internet) 395 shown in FIGS. 3B, 3D, and 3F. Both WWAN XCVRs 410 andshort range XCVRs 420 connect through radio frequency (RF)send-and-receive amplifiers (not shown) to an associated antenna (notshown).

The wearable device 399 and the mobile device 390 further includes amicroprocessor, shown as CPU 430, sometimes referred to herein as thehost controller. A processor is a circuit having elements structured andarranged to perform one or more processing functions, typically variousdata processing functions. Although discrete logic components could beused, the examples utilize components forming a programmable CPU. Amicroprocessor for example includes one or more integrated circuit (IC)chips incorporating the electronic elements to perform the functions ofthe CPU. The processor 430, for example, may be based on any known oravailable microprocessor architecture, such as a Reduced Instruction SetComputing (RISC) using an ARM architecture, as commonly used today inmobile devices and other portable electronic devices. Of course, otherprocessor circuitry may be used to form the CPU 430 or processorhardware in smartphone, laptop computer, and tablet.

The microprocessor 430 serves as a programmable host controller for themobile device 390 and the wearable device 399 by configuring the mobiledevice 390 and the wearable device 399 to perform various operations,for example, in accordance with instructions or programming executableby processor 430. For example, such operations may include variousgeneral operations of the mobile device 390 or wearable device 399, aswell as operations related to the gesture detection programming 345,wear/carry detection programming 386, or activity detection programming302, and communications with the eyewear device 100 and server system398. Although a processor may be configured by use of hardwired logic,typical processors in mobile devices are general processing circuitsconfigured by execution of programming.

The mobile device 390 and the wearable device 399 includes a memory orstorage device system, for storing data and programming. In the example,the memory system may include a flash memory 440A and a random accessmemory (RAM) 440B. The RAM 440B serves as short term storage forinstructions and data being handled by the processor 430, e.g. as aworking data processing memory. The flash memory 440A typically provideslonger term storage. Mobile device 390 and the wearable device 399 caninclude a visible light camera 470.

In the example of mobile device 390 or the wearable device 399, theflash memory 440A is used to store programming or instructions forexecution by the processor 430. For speed and efficiency as previouslyexplained, the gesture detection programming 345, wear/carry detectionprogramming 386, or activity detection programming 302 of the eyeweardevice 100 may be implemented in firmware. Gesture detection programming345, wear/carry detection programming 386, or activity detectionprogramming 302 may be stored locally in a read-only memory (ROM),erasable programmable read-only memory (EPROM), or flash memory 440A.

Alternatively or additionally, the gesture detection programming 345,wear/carry detection programming 386, or activity detection programming302 can be implemented in the application layer or have portionsresiding at the application layer, as described previously. Hence,depending on the type of device, the mobile device 390 and the wearabledevice 399 stores and runs a mobile operating system through whichspecific applications, including gesture detection programming 345,wear/carry detection programming 386, activity detection programming302, and communications with the eyewear device 100 and server system398, are executed. Applications may be a native application, a hybridapplication, or a web application (e.g., a dynamic web page executed bya web browser) that runs on mobile device 390 or wearable device 399.Examples of mobile operating systems include Google Android, Apple iOS(I-Phone or iPad devices), Windows Mobile, Amazon Fire OS, RIMBlackBerry operating system, or the like.

FIG. 5 is a flowchart of a method that can be implemented in the neuralnetwork system 300 of FIGS. 3B, 3D, and 3F to optimize execution speedand efficiency of the generated neural network model for gesture, wear,carry, or activity identification. It should be understood that in otherexamples the training model can be generated in any language,interpreted or native. The training model can be generated on the CPU,GPU, FPGA or ASIC—it really doesn't matter how it is generated as longas it is able to optimize/calculate the weights of the network based onthe training data. Beginning in block 500, a trained scripting languagemodel in an interpreted programing language based on acquired trainingdata 376A-N, 310A-N that includes tracked movement over time intervalsof known, but unclassified gestures 377A-N or activities 311A-Noccurring while a wearable device 399 (e.g., eyewear device 100) is wornand also when the wearable device 399 is not being worn is generated.For wear detection, the trained scripting language model is generatedbased on acquired training data 351A-N of tracked movement over timeintervals during wearing of the wearable device 399 (e.g., eyeweardevice 100) and when the wearable device 399 is not being worn. Forcarry detection, the trained scripting language model is generated basedon acquired training data 351A-N of tracked movement over time intervalsduring carrying of the mobile device 390 and when the mobile device 390is not being carried.

In the gesture identification example, accelerometer training data iscollected during double tap (or other multiple taps, such as a tripletap) gestures on either side of the eyewear device 100, using customfirmware for data collection. The last 800 ms or other time interval(s)of accelerometer samples for the training data are collected in anembedded multi-media controller (EMMC), along with an indication whichlateral side 170A-B, chunk 110A-B, or other input surface 181 the tapgesture occurred. This process is called training data labeling. Foreach individual data element, a label is created, typically by a humanthat specifies the gesture. For example, for the double tap detector,the labels are none, left side, right side. Additionally, a large numberof samples for non-target gestures training data is collected; this isthe null class. In some examples, this data can be split into a trainingand test data set. The convolutional neural network (CNN) is trained onthis acquired training data (using a 1-dimensional convolution layerover the temporal dimension of the previous mentioned features). Keras,a high-level neural network API, written in Python and capable ofrunning on top of TensorFlow, CNTK, or Theano, can be used to train themodel and generate the trained scripting language gesture model. Kerasis just one example of a framework that can be used to train neuralnetworks. One can use TensorFlow, CNTK or Theano directly or use anyother neural network training framework. This can be extended to anyother framework.

In the wear or activity identification example, the inertial measurementunit (IMU) features are collected as training data during wear/no wearsessions, using custom firmware to save the IMU training data with alabel to the EMMC in comma separated value (CSV) format indicating thetype of IMU measurement. The IMU features (e.g., training data) include:acceleration (X, Y, Z); gyroscope (pitch, yaw, roll)—rotationalacceleration; time (e.g., milliseconds) since the last motion interrupt(e.g., an interrupt fires if motion occurs after 4 seconds of nomotion); and time (e.g., milliseconds) since last no motion interrupt(interrupt fires if device is idle for 4 consecutive seconds). Each ofthese features are sampled at a sampling rate frequency of 100 Hz. Usersof the wearable device 399 (e.g., eyewear device 100) collect dataduring various activities or settings of wearing or not wearing thedevice, for example: biking (e.g., wearing eyewear device 100 or havingeyewear device 100 in a backpack), swimming, jogging, and driving. Thewear training data is downloaded from the eyewear device 100, split datainto a training or test data set. The convolutional neural network (CNN)is trained on data (using a 1-dimensional convolution layer over thetemporal dimension of the previous mentioned training data features).The following parameters can be utilized with a randomized grid search.First, a sampling rate (100 Hz is faster than needed, so down samplingcan be used as needed), for example, a 300 ms sampling rate can be used.The slower the sampling rate, the less memory needed to use for a singlemodel evaluation (but at a point performance takes a hit). Second,window size (i.e., how many consecutive samples are examined at a giventime, for example, 6 seconds of samples can be used. Third, variousneural network hyper parameters (e.g., number of layers, activationfunctions, loss functions, etc.) can be utilized. Keras can be used totrain the model and generate the trained scripting language gesturemodel.

Once a trained scripting language model for gesture, wear, carry, oractivity detection that is satisfactory is generated, the next challengeis deploying that trained scripting language model to a microcontroller(where computing resources are limited, e.g., extremely limited in flashand RAM memory). What this means, is the number of weights in the modeland how much data is need in memory at any given time is very important.

Moving to block 510, a calibrated set of gesture weights 347A-N,wear/carry weights 388A-N, or activity weights 304A-N is extracted fromthe trained scripting language gesture model. To address the limitedcomputing resources of the microcontroller of the wearable device 399 orthe mobile device 390, a generic framework that consists of two mainparts can be utilized. First, a Python script takes the trainedscripting language model (e.g., Keras based) for gesture, wear, carry,or activity detection and generates C code of the weights, and a Capplication programming interface (API) to perform the inference step.

Hence, continuing to block 520, a trained inference code model for thegestures, wear/carry, or activity detection in a compiled programming isgenerated language based on the trained scripting language model. Forexample, the trained scripting language model generated with Keras iswritten in Python, and is passed into a Keras exporter, which generatesan exported C code model. In addition, a C code static libraryimplementation of many common neural network abstractions (e.g., denselayers, convolutional layers, various activation functions etc.) iswritten in C code and is used by the exported C code model. WhileKeras2CPP is an open-source tool that can port Keras Python models intoC++ code, the ported code includes a lot of modern C++ (including manyStandard Template Libraries (STL), dynamic memory allocation, etc.) andthe library code is too large to easily fit in many modernmicrocontrollers. The trained inference code model in the C languageimplementation requires no dynamic memory allocation and the code sizeis more than an order of magnitude smaller (with the same compilationflags, optimizing for size, etc.).

Moving to block 530, the trained inference code model is compiled into atrained machine inference code model. For gesture detection, the size ofthe compiled Keras2CPP library size is 14,128 bytes of text, 2,330 bytesof data, 1 byte of statically allocated data, for a total of 16,479bytes. The size of the compiled machine code from the C code staticlibrary implementation of common neural network abstractions is 444bytes of text, 0 bytes of data, for a total of 444 bytes, which reducesthe memory required for the gesture detection programming 345,wear/carry detection programming 386, or activity detection programming302 loaded in the wearable device (e.g., eyewear device 100) or mobiledevice 390 dramatically. This is just one example and the size reductionvaries quite a bit from model-to-model, however, the takeaway is thatprocedure enables deployment neural networks to micro controllers usingonly a few hundred bytes of flash and a few hundred bytes of RAM.

Proceeding now to block 540, the extracted calibrated set of gestureweights 347A-N, wear/carry weights 388A-N, or activity weights 304A-Nare quantized. Quantization converts weights that are stored in floatingpoint format (e.g., at least 32 bit IEEE standard floating point) into 8bit numbers, which reduces the memory required for the gesture detectionprogramming 345, wear/carry detection programming 386, activitydetection programming 302 by a factor of 4 times (4×) with negligibleperformance degradation.

Finishing now in block 550, the trained inference machine code model andthe quantized calibrated set of gesture weights 347A-N, wear/carryweights 388A-N, or activity weights 304A-N are loaded in staticallyallocated memory of the wearable device 399 (e.g., eyewear device 100)or the mobile device 390, as opposed to dynamically allocated memory.These components comprise the gesture detection programming 345,wear/carry detection programming 386, or activity detection programming302.

Once the optimized C code neural network model is generated and compiledinto machine code, the model is called from application firmware toidentify gestures, detect wear of the wearable device 399 (e.g., eyeweardevice 100) or carrying of the mobile device 390, and identifying anyactivities performed during wearing of the wearable device 399. Forgesture detection, on a tap motion interrupt 367A-N from the IMU, thelast 800 ms of accelerometer measurements 361A-N from the movementtracker 118 are run through the trained machine code gesture model 346to detect double tap gestures on either side of the eyewear device 100.Based on the side of the double tap on the eyewear device 100, adifferent operation is implemented in the trained machine code gesturemodel 346. An example, generated C code API for double tap detectiontakes accelerometer measurements 361A-N over time as an input layerarray from the last 800 milliseconds (ms) and outputs a probability(confidence level 371A-N) output layer array for three classes: nodouble tap, left temple 125A double tap, and right temple 125B doubletap. An example, generated C code API for wear detection takesaccelerometer measurements 361A-N, gyroscope measurements 364A-N, andmotion data (e.g., motion interrupt time coordinates 367A-N) from theIMU over time as an input layer array and outputs an output layer thatis a single probability (confidence level 392) that the eyewear device100 is being worn. An example, generated C code API for activitydetection takes accelerometer measurements 361A-N, gyroscopemeasurements 364A-N, and motion data (e.g., motion interrupt timecoordinates 367A-N) from the IMU over time as an input layer array andoutputs a probability output layer array that includes 4 classes 307A-Nand confidence levels 308A-N: biking, swimming, jogging, and driving.The input (e.g., 301A-N) and output (306A-N) layer arrays are staticallyallocated instead of dynamically allocated to improve runtime. Ofcourse, the size of the input and output layer arrays will varydepending on the number of gestures and activities.

FIGS. 6-9 illustrate several examples of multiple finger contactdetected touch events and identified finger gestures on an input surface181 of the eyewear device 100. In each of the examples of FIGS. 6-9 ,the function to detect the at least one touch event 349A-N on the inputsurface 181 based on the at least one finger contact 179 inputted fromthe user includes the following functions. Detect a first touch event349A on the input surface 181 based on a first taken measurement 362A,365A at a first time coordinate 363A, 366A, 367A corresponding to afirst finger contact 179A inputted from the user at a first input time.Detect a second touch event 349B on the input surface 181 based on asecond taken measurement 362B, 365B taken at a second time coordinate363B, 366B, 367B corresponding to a second finger contact 179B inputtedfrom the user at a second input time within the tracked time period. Thefunction to identify the finger gesture is based on the first and seconddetected touch events 349A-B, the first time coordinate 363A, 366A, 367Aand the second time coordinate 363B, 366B, 367B.

FIGS. 6A-C illustrate press and hold detected touch events on the inputsurface 181. As shown, multiple finger contacts occur on the inputsurface 181, which include pressing (the first finger contact 610A),holding (the second finger contact 610B), and no finger contact 610C byreleasing the input surface 181. Accordingly, the first and seconddetected touch events are a press and hold on the input surface 181. Theidentified finger gesture is a press and hold of a graphical userinterface element in the image presented on the image display. Theadjustment to the image presented on the image display based on theidentified finger gesture is configured to allow a drag and drop (e.g.,move) of the graphical user interface element on the image display orprovide display options (e.g., a context menu associated with thegraphical user interface element).

FIG. 7 illustrates finger pinching and unpinching detected touch eventson the input surface 181. Multiple finger contacts occur on the inputsurface 181, in which two fingers (first finger contact 710A and secondfinger contact 710B) move apart from each other (finger unpinching) ormove toward each other (finger pinching). In the finger pinchingdetected touch event example, the first and second detected touch eventsare finger pinching on the input surface 181. The identified fingergesture is a zoom in of the image presented on the image display. Theadjustment to the image presented on the image display based on theidentified finger gesture zooms in on the image presented on the imagedisplay.

In the finger unpinching detected touch event example, the first andsecond detected touch events are finger unpinching on the input surface181. The identified finger gesture is a zoom out of the image presentedon the image display. The adjustment to the image presented on the imagedisplay based on the identified finger gesture zooms out of the imagepresented on the image display.

FIG. 8 illustrates finger rotation detected touch events on the inputsurface 181. As shown, multiple finger contacts occur on the inputsurface 181, which include continuously rotating two fingers in a circlefrom two initial points, a first finger contact 810A and a second fingercontact 810B, to two final points of contact for those two fingers. Insome examples, only one finger may be rotated in a circle. The first andsecond detected touch events are finger rotation on the input surface181. The identified finger gesture is a finger rotation of the imagepresented on the image display. The adjustment to the image presented onthe display based on the identified finger gesture rotates the imagepresented on the image display, for example, to rotate a view. Therotation gesture is can occur when two fingers rotate around each other.

FIG. 9 illustrates finger swiping detected touch events on the inputsurface 181. As shown, multiple finger contacts occur on the inputsurface 181, which include dragging one finger left or right from apoint of initial finger contact 910A to a final point of second fingercontact 910B or 910C. The first and second detected touch events arefinger swiping from front to back (910A to 910C) or back to front (910Ato 910B) on the input surface 181. The identified finger gesture is ascroll of the image presented on the image display. The adjustment tothe image presented on the image display based on the identified fingergesture scrolls the image presented on the image display. As shown, sucha scroll or swipe gesture can occur when the user moves one or morefingers across the screen in a specific horizontal direction withoutsignificantly deviating from the main direction of travel, however, itshould be understood that the direction of travel can be vertical aswell.

Any of the functions described herein for gesture, wear, activity, orcarry detection of the wearable device (e.g., eyewear device 100),mobile device 390, and server system 398 can be embodied in on one ormore methods as method steps or in one more applications as describedpreviously. According to some embodiments, “programming,” an“application,” “applications,” or “firmware” are program(s) that executefunctions defined in the program, such as logic embodied in software orhardware instructions. Various programming languages can be employed tocreate one or more of the applications, structured in a variety ofmanners, such as object-oriented programming languages (e.g.,Objective-C, Java, or C++) or procedural programming languages (e.g., Cor assembly language). In a specific example, a third party application(e.g., an application developed using the ANDROID™ or IOS™ softwaredevelopment kit (SDK) by an entity other than the vendor of theparticular platform) may be mobile software running on a mobileoperating system such as IOS™, ANDROID™, WINDOWS® Phone, or anothermobile operating systems. In this example, the third party applicationcan invoke API calls provided by the operating system to facilitatefunctionality described herein. The applications can be stored in anytype of computer readable medium or computer storage device and beexecuted by one or more general-purpose computers. In addition, themethods and processes disclosed herein can alternatively be embodied inspecialized computer hardware or an application specific integratedcircuit (ASIC), field programmable gate array (FPGA) or a complexprogrammable logic device (CPLD).

Program aspects of the technology may be thought of as “products” or“articles of manufacture” typically in the form of executable codeand/or associated data that is carried on or embodied in a type ofmachine-readable medium. For example, programming code could includecode for the fingerprint sensor, user authorization, navigation, orother functions described herein. “Storage” type media include any orall of the tangible memory of the computers, processors or the like, orassociated modules thereof, such as various semiconductor memories, tapedrives, disk drives and the like, which may provide non-transitorystorage at any time for the software programming. All or portions of thesoftware may at times be communicated through the Internet or variousother telecommunication networks. Such communications, for example, mayenable loading of the software from one computer or processor intoanother, for example, from the server system 398 or host computer of theservice provider into the computer platforms of the wearable device 399(e.g., eyewear device 100) and mobile device 390. Thus, another type ofmedia that may bear the programming, media content or meta-data filesincludes optical, electrical and electromagnetic waves, such as usedacross physical interfaces between local devices, through wired andoptical landline networks and over various air-links. The physicalelements that carry such waves, such as wired or wireless links, opticallinks or the like, also may be considered as media bearing the software.As used herein, unless restricted to “non-transitory”, “tangible”, or“storage” media, terms such as computer or machine “readable medium”refer to any medium that participates in providing instructions or datato a processor for execution.

Hence, a machine-readable medium may take many forms of tangible storagemedium. Non-volatile storage media include, for example, optical ormagnetic disks, such as any of the storage devices in any computer(s) orthe like, such as may be used to implement the client device, mediagateway, transcoder, etc. shown in the drawings. Volatile storage mediainclude dynamic memory, such as main memory of such a computer platform.Tangible transmission media include coaxial cables; copper wire andfiber optics, including the wires that comprise a bus within a computersystem. Carrier-wave transmission media may take the form of electric orelectromagnetic signals, or acoustic or light waves such as thosegenerated during radio frequency (RF) and infrared (IR) datacommunications. Common forms of computer-readable media thereforeinclude for example: a floppy disk, a flexible disk, hard disk, magnetictape, any other magnetic medium, a CD-ROM, DVD or DVD-ROM, any otheroptical medium, punch cards paper tape, any other physical storagemedium with patterns of holes, a RAM, a PROM and EPROM, a FLASH-EPROM,any other memory chip or cartridge, a carrier wave transporting data orinstructions, cables or links transporting such a carrier wave, or anyother medium from which a computer may read programming code and/ordata. Many of these forms of computer readable media may be involved incarrying one or more sequences of one or more instructions to aprocessor for execution.

The scope of protection is limited solely by the claims that now follow.That scope is intended and should be interpreted to be as broad as isconsistent with the ordinary meaning of the language that is used in theclaims when interpreted in light of this specification and theprosecution history that follows and to encompass all structural andfunctional equivalents. Notwithstanding, none of the claims are intendedto embrace subject matter that fails to satisfy the requirement ofSections 101, 102, or 103 of the Patent Act, nor should they beinterpreted in such a way. Any unintended embracement of such subjectmatter is hereby disclaimed.

Except as stated immediately above, nothing that has been stated orillustrated is intended or should be interpreted to cause a dedicationof any component, step, feature, object, benefit, advantage, orequivalent to the public, regardless of whether it is or is not recitedin the claims.

It will be understood that the terms and expressions used herein havethe ordinary meaning as is accorded to such terms and expressions withrespect to their corresponding respective areas of inquiry and studyexcept where specific meanings have otherwise been set forth herein.Relational terms such as first and second and the like may be usedsolely to distinguish one entity or action from another withoutnecessarily requiring or implying any actual such relationship or orderbetween such entities or actions. The terms “comprises,” “comprising,”“includes,” “including,” or any other variation thereof, are intended tocover a non-exclusive inclusion, such that a process, method, article,or apparatus that comprises or includes a list of elements or steps doesnot include only those elements or steps but may include other elementsor steps not expressly listed or inherent to such process, method,article, or apparatus. An element preceded by “a” or “an” does not,without further constraints, preclude the existence of additionalidentical elements in the process, method, article, or apparatus thatcomprises the element.

Unless otherwise stated, any and all measurements, values, ratings,positions, magnitudes, sizes, and other specifications that are setforth in this specification, including in the claims that follow, areapproximate, not exact. Such amounts are intended to have a reasonablerange that is consistent with the functions to which they relate andwith what is customary in the art to which they pertain. For example,unless expressly stated otherwise, a parameter value or the like mayvary by as much as ±10% from the stated amount.

In addition, in the foregoing Detailed Description, it can be seen thatvarious features are grouped together in various examples for thepurpose of streamlining the disclosure. This method of disclosure is notto be interpreted as reflecting an intention that the claimed examplesrequire more features than are expressly recited in each claim. Rather,as the following claims reflect, the subject matter to be protected liesin less than all features of any single disclosed example. Thus, thefollowing claims are hereby incorporated into the Detailed Description,with each claim standing on its own as a separately claimed subjectmatter.

While the foregoing has described what are considered to be the bestmode and other examples, it is understood that various modifications maybe made therein and that the subject matter disclosed herein may beimplemented in various forms and examples, and that they may be appliedin numerous applications, only some of which have been described herein.It is intended by the following claims to claim any and allmodifications and variations that fall within the true scope of thepresent concepts.

What is claimed is:
 1. A method for generating a neural network modelfor a wearable device, the method comprising: generating a trained modelusing training data acquired from the wearable device; extracting acalibrated set of weights from the trained model; generating a trainedinferencing code model based on the trained model; compiling the trainedinferencing code model into a trained machine inferencing code model;quantizing the calibrated set of weights; and loading the trainedinferencing machine code model and the quantized calibrated set ofweights onto the wearable device.
 2. The method of claim 1, wherein thetrained model is a trained scripting language model and whereingenerating the trained model comprises: generating the trained scriptinglanguage model in an interpreted programming language based on theacquired training data.
 3. The method of claim 2, wherein theinferencing comprises: generating C language code based on the trainedscripting language model.
 4. The method of claim 2, wherein the acquiredtraining data includes at least one of: tracked movement of the wearabledevice over time intervals occurring while the wearable device is beingcarried and while the wearable device is not being carried; or trackedmovement of the wearable device over time intervals occurring while thewearable device is being worn and while the wearable device is not beingworn; or.
 5. The method of claim 2, wherein the acquired training dataincludes tracked movement of the wearable device over time intervals inresponse to known and unclassified gestures or activities occurringwhile the wearable device is being worn and while the wearable device isnot being worn.
 6. The method of claim 2, wherein the wearable device isan eyewear device having a left side and a right side and wherein thegenerating the trained model comprises: creating labels for each elementof trained data, the labels comprising left corresponding the left sideof the eyewear device and right corresponding the right side of theeyewear device.
 7. The method of claim 1, wherein the calibrated set ofweights are extracted in a floating point format and wherein thequantizing comprises: converting the calibrated set of weights from thefloating point format to a fixed number of bits format.
 8. The method ofclaim 2, wherein the wearable device comprises statically allocatedmemory and wherein the loading comprises: loading the trainedinferencing machine code model and the quantized calibrated set ofweights into the statically allocated memory of the wearable device. 9.The method of claim 2, wherein the wearable device comprise flash memoryand random access memory (RAM) and wherein the loading comprises:loading the trained inferencing machine code model and the quantizedcalibrated set of weights into at least one of the flash memory or RAMof the wearable device.
 10. A wearable device comprising a neuralnetwork model, the neural network model generated by: generating atrained model using training data acquired from the wearable device;extracting a calibrated set of weights from the trained model;generating a trained inferencing code model based on the trained model;compiling the trained inferencing code model into a trained machineinferencing code model; and quantizing the calibrated set of weights;wherein the neural network model comprises the trained inferencingmachine code model and the quantized calibrated set of weights.
 11. Thewearable device of claim 10, wherein the trained model is a trainedscripting language model and wherein generating the trained modelcomprises: generating the trained scripting language model in aninterpreted programming language based on the acquired training data.12. The wearable device of claim 11, wherein the inferencing comprises:generating C language code based on the trained scripting languagemodel.
 13. The wearable device of claim 11, wherein the acquiredtraining data includes at least one of: tracked movement of the wearabledevice over time intervals occurring while the wearable device is beingcarried and while the wearable device is not being carried; or trackedmovement of the wearable device over time intervals occurring while thewearable device is being worn and while the wearable device is not beingworn.
 14. The wearable device of claim 11, wherein the acquired trainingdata includes tracked movement of the wearable device over timeintervals in response to known and unclassified gestures or activitiesoccurring while the wearable device is being worn and while the wearabledevice is not being worn.
 15. The wearable device of claim 11, whereinthe wearable device is an eyewear device having a left side and a rightside and wherein the generating the trained model comprises: creatinglabels for each element of trained data, the labels comprising leftcorresponding the left side of the eyewear device and rightcorresponding the right side of the eyewear device.
 16. The wearabledevice of claim 10, wherein the calibrated set of weights are extractedin a floating point format and wherein the quantizing comprises:converting the calibrated set of weights from the floating point formatto a fixed number of bits format.
 17. The wearable device of claim 11,further comprising statically allocated memory, wherein the trainedinferencing machine code model and the quantized calibrated set ofweights are loaded into the statically allocated memory of the wearabledevice.
 18. The wearable device of claim 11, further comprising flashmemory and random access memory (RAM), wherein the trained inferencingmachine code model and the quantized calibrated set of weights areloaded into at least one of the flash memory or RAM of the wearabledevice.
 19. A non-transitory computer readable media storinginstructions for generating a neural network model for use with awearable device, the instructions, when executed by a processor performfunctions comprising functions to: generate a trained model usingtraining data acquired from the wearable device; extract a calibratedset of weights from the trained model; generate a trained inferencingcode model based on the trained model; compile the trained inferencingcode model into a trained machine inferencing code model; and quantizethe calibrated set of weights; wherein the neural network modelcomprises the trained inferencing machine code model and the quantizedcalibrated set of weights.
 20. The non-transitory computer readablemedia of claim 19, wherein the trained model is a trained scriptinglanguage model and wherein to generate the trained model theinstructions when executed by the processor perform a function to:generate the trained scripting language model in an interpretedprogramming language based on the acquired training data.